EDA Project¶
Project Proposal¶
The dataset contains documented crime incidents in Chicago from 2001 to the present and from the Chicago Police Department's CLEAR system. https://catalog.data.gov/dataset/crimes-2001-to-present
My family and I are moving to Chicago, and I would like to be aware of when crime increases and decreases and which crimes are the most common to make the most suitable judgment to keep me and my family safe. I am also curious to discover what crime will look like in the next five years based on patterns from the dataset.
The fields that will be most helpful:
- location
- year
- primary_type
I aim to learn where crime is most evident, the ratio of crimes to arrests, and the correlation between each location and crime type. Learning this information will advise me and my family on where to relocate.
#Load up modules
import pandas as pd
import numpy as np
#set up notebook to display multiple output in one cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'
crimes = pd.read_csv('crimes.csv',sep= ',')
crimes.info()
crimes.head()
crimes.tail()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 65499 entries, 0 to 65498 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 65499 non-null int64 1 Case Number 65499 non-null object 2 Date 65499 non-null object 3 Block 65499 non-null object 4 IUCR 65499 non-null object 5 Primary Type 65499 non-null object 6 Description 65499 non-null object 7 Location Description 65084 non-null object 8 Arrest 65499 non-null bool 9 Domestic 65499 non-null bool 10 Beat 65499 non-null int64 11 District 65499 non-null int64 12 Ward 65458 non-null float64 13 Community Area 65463 non-null float64 14 FBI Code 65499 non-null object 15 X Coordinate 64762 non-null float64 16 Y Coordinate 64762 non-null float64 17 Year 65499 non-null int64 18 Updated On 65499 non-null object 19 Latitude 64762 non-null float64 20 Longitude 64762 non-null float64 21 Location 64762 non-null object dtypes: bool(2), float64(6), int64(4), object(10) memory usage: 10.1+ MB
| ID | Case Number | Date | Block | IUCR | Primary Type | Description | Location Description | Arrest | Domestic | ... | Ward | Community Area | FBI Code | X Coordinate | Y Coordinate | Year | Updated On | Latitude | Longitude | Location | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 5741943 | HN549294 | 08/25/2007 09:22:18 AM | 074XX N ROGERS AVE | 560 | ASSAULT | SIMPLE | OTHER | False | False | ... | 49.0 | 1.0 | 08A | NaN | NaN | 2007 | 08/17/2015 03:03:40 PM | NaN | NaN | NaN |
| 1 | 25953 | JE240540 | 05/24/2021 03:06:00 PM | 020XX N LARAMIE AVE | 110 | HOMICIDE | FIRST DEGREE MURDER | STREET | True | False | ... | 36.0 | 19.0 | 01A | 1141387.0 | 1913179.0 | 2021 | 11/18/2023 03:39:49 PM | 41.917838 | -87.755969 | (41.917838056, -87.755968972) |
| 2 | 26038 | JE279849 | 06/26/2021 09:24:00 AM | 062XX N MC CORMICK RD | 110 | HOMICIDE | FIRST DEGREE MURDER | PARKING LOT | True | False | ... | 50.0 | 13.0 | 01A | 1152781.0 | 1941458.0 | 2021 | 11/18/2023 03:39:49 PM | 41.995219 | -87.713355 | (41.995219444, -87.713354912) |
| 3 | 13279676 | JG507211 | 11/09/2023 07:30:00 AM | 019XX W BYRON ST | 620 | BURGLARY | UNLAWFUL ENTRY | APARTMENT | False | False | ... | 47.0 | 5.0 | 5 | 1162518.0 | 1925906.0 | 2023 | 11/18/2023 03:39:49 PM | 41.952345 | -87.677975 | (41.952345086, -87.677975059) |
| 4 | 13274752 | JG501049 | 11/12/2023 07:59:00 AM | 086XX S COTTAGE GROVE AVE | 454 | BATTERY | AGGRAVATED P.O. - HANDS, FISTS, FEET, NO / MIN... | SMALL RETAIL STORE | True | False | ... | 6.0 | 44.0 | 08B | 1183071.0 | 1847869.0 | 2023 | 12/09/2023 03:41:24 PM | 41.737751 | -87.604856 | (41.737750767, -87.604855911) |
5 rows × 22 columns
| ID | Case Number | Date | Block | IUCR | Primary Type | Description | Location Description | Arrest | Domestic | ... | Ward | Community Area | FBI Code | X Coordinate | Y Coordinate | Year | Updated On | Latitude | Longitude | Location | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 65494 | 13260483 | JG484114 | 10/30/2023 03:15:00 AM | 003XX W 42ND PL | 910 | MOTOR VEHICLE THEFT | AUTOMOBILE | STREET | True | False | ... | 20.0 | 37.0 | 7 | 1174729.0 | 1876779.0 | 2023 | 11/07/2023 03:41:07 PM | 41.817273 | -87.634558 | (41.817272712, -87.634557976) |
| 65495 | 13260346 | JG483955 | 10/30/2023 01:25:00 AM | 071XX S DR MARTIN LUTHER KING JR DR | 1310 | CRIMINAL DAMAGE | TO PROPERTY | APARTMENT | False | False | ... | 6.0 | 69.0 | 14 | 1180150.0 | 1857738.0 | 2023 | 11/07/2023 03:41:07 PM | 41.764900 | -87.615256 | (41.764899756, -87.615255831) |
| 65496 | 13260937 | JG484816 | 10/30/2023 03:30:00 PM | 112XX S HOMEWOOD AVE | 460 | BATTERY | SIMPLE | RESIDENCE | False | False | ... | 19.0 | 75.0 | 08B | 1165410.0 | 1830013.0 | 2023 | 11/07/2023 03:41:07 PM | 41.689143 | -87.670065 | (41.689143038, -87.670065135) |
| 65497 | 13261032 | JG484978 | 10/30/2023 07:00:00 AM | 003XX E ERIE ST | 910 | MOTOR VEHICLE THEFT | AUTOMOBILE | STREET | False | False | ... | 2.0 | 8.0 | 7 | 1178676.0 | 1904855.0 | 2023 | 11/07/2023 03:41:07 PM | 41.894226 | -87.619223 | (41.894226067, -87.619222865) |
| 65498 | 13260611 | JG484346 | 10/30/2023 10:35:00 AM | 052XX N BROADWAY | 560 | ASSAULT | SIMPLE | SMALL RETAIL STORE | True | False | ... | 48.0 | 77.0 | 08A | 1167370.0 | 1934861.0 | 2023 | 11/07/2023 03:41:07 PM | 41.976815 | -87.659880 | (41.976814727, -87.659880317) |
5 rows × 22 columns
EDA Phase 1¶
- I hope to learn about which crimes are most common. I plan to break down each crime and organize them into years and exact locations.
- I have a hunch that burglary will be the most common crime, based on what I have observed in the news over the years. I feel there is a big spread of word about crime in Chicago, although, in my experience, it is never specified by word of mouth. I also have a hunch that burglary will be the highest reported within the most recent five years. I also have a hunch that within the next five years, burglary will still be the highest crime based on patterns.
- The location of these crimes are reported in the City of Chicago from 2001 to present.
- The total sample size before cleaning data is 65,499 reported crimes.
- There is nothing particular about the data.
crimes = pd.read_csv('crimes.csv')
crimes.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 65499 entries, 0 to 65498 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 65499 non-null int64 1 Case Number 65499 non-null object 2 Date 65499 non-null object 3 Block 65499 non-null object 4 IUCR 65499 non-null object 5 Primary Type 65499 non-null object 6 Description 65499 non-null object 7 Location Description 65084 non-null object 8 Arrest 65499 non-null bool 9 Domestic 65499 non-null bool 10 Beat 65499 non-null int64 11 District 65499 non-null int64 12 Ward 65458 non-null float64 13 Community Area 65463 non-null float64 14 FBI Code 65499 non-null object 15 X Coordinate 64762 non-null float64 16 Y Coordinate 64762 non-null float64 17 Year 65499 non-null int64 18 Updated On 65499 non-null object 19 Latitude 64762 non-null float64 20 Longitude 64762 non-null float64 21 Location 64762 non-null object dtypes: bool(2), float64(6), int64(4), object(10) memory usage: 10.1+ MB
Initial Observations¶
- ID, Case number, Date, Block, IUCR, Primary type, and Description have both an equal amount and the most nulls.
- Someof the data is not needed for my goals, so I will need to eliminate 14 columns.
# number of unique crimes
crimes['Primary Type'].unique()
#Show the top 5 unique crimes
crimes['Primary Type'].value_counts()[:5]
array(['ASSAULT', 'HOMICIDE', 'BURGLARY', 'BATTERY', 'THEFT',
'CRIMINAL DAMAGE', 'DECEPTIVE PRACTICE', 'MOTOR VEHICLE THEFT',
'CRIMINAL SEXUAL ASSAULT', 'OFFENSE INVOLVING CHILDREN', 'ROBBERY',
'OTHER OFFENSE', 'SEX OFFENSE', 'WEAPONS VIOLATION', 'STALKING',
'OBSCENITY', 'CRIMINAL TRESPASS', 'PROSTITUTION', 'ARSON',
'NARCOTICS', 'KIDNAPPING', 'CONCEALED CARRY LICENSE VIOLATION',
'INTERFERENCE WITH PUBLIC OFFICER', 'PUBLIC PEACE VIOLATION',
'LIQUOR LAW VIOLATION', 'NON-CRIMINAL', 'INTIMIDATION',
'HUMAN TRAFFICKING', 'GAMBLING', 'OTHER NARCOTIC VIOLATION',
'CRIM SEXUAL ASSAULT'], dtype=object)
Primary Type THEFT 14831 BATTERY 11094 CRIMINAL DAMAGE 7104 MOTOR VEHICLE THEFT 6243 ASSAULT 5801 Name: count, dtype: int64
# number of unique crime descriptions
crimes['Description'].unique()
#Show the top 5 unique crime descriptions
crimes['Description'].value_counts()[:5]
array(['SIMPLE', 'FIRST DEGREE MURDER', 'UNLAWFUL ENTRY',
'AGGRAVATED P.O. - HANDS, FISTS, FEET, NO / MINOR INJURY',
'$500 AND UNDER', 'TO VEHICLE', 'THEFT BY LESSEE, MOTOR VEHICLE',
'DOMESTIC BATTERY SIMPLE', 'AUTOMOBILE',
'AGG. DOMESTIC BATTERY - HANDS, FISTS, FEET, SERIOUS INJURY',
'FORGERY', 'FINANCIAL IDENTITY THEFT OVER $ 300',
'ILLEGAL USE CASH CARD', 'NON-AGGRAVATED', 'TO PROPERTY',
'OVER $500', 'FROM BUILDING', 'BOGUS CHECK', 'CHILD ABDUCTION',
'ATTEMPT - FINANCIAL IDENTITY THEFT', 'ARMED - HANDGUN',
'PREDATORY', 'VEHICULAR HIJACKING',
'SEXUAL ASSAULT OF CHILD BY FAMILY MEMBER',
'HARASSMENT BY ELECTRONIC MEANS',
'AGGRAVATED CRIMINAL SEXUAL ABUSE BY FAMILY MEMBER',
'SEXUAL EXPLOITATION OF A CHILD',
'AGGRAVATED CRIMINAL SEXUAL ABUSE', 'RECKLESS FIREARM DISCHARGE',
'CRIMINAL SEXUAL ABUSE BY FAMILY MEMBER',
'AGGRAVATED SEXUAL ASSAULT OF CHILD BY FAMILY MEMBER',
'CYBERSTALKING', 'AGGRAVATED - OTHER', 'RETAIL THEFT',
'SALE / DISTRIBUTE OBSCENE MATERIAL TO MINOR', 'TO STATE SUP LAND',
'THEFT OF LOST / MISLAID PROPERTY', 'TO LAND',
'ATTEMPT - AUTOMOBILE',
'AGGRAVATED DOMESTIC BATTERY - OTHER DANGEROUS WEAPON',
'OTHER VEHICLE OFFENSE', 'FALSE / STOLEN / ALTERED TRP',
'AGGRAVATED - KNIFE / CUTTING INSTRUMENT',
'VEHICLE TITLE / REGISTRATION OFFENSE', 'CHILD ABUSE',
'CREDIT CARD FRAUD', 'FINANCIAL IDENTITY THEFT $300 AND UNDER',
'FRAUD OR CONFIDENCE GAME', 'AGGRAVATED - HANDGUN',
'ARMED - OTHER FIREARM', 'AGGRAVATED VEHICULAR HIJACKING',
'ATTEMPT FORCIBLE ENTRY', 'FORCIBLE ENTRY',
'VIOLATE ORDER OF PROTECTION', 'UNLAWFUL POSSESSION - HANDGUN',
'HARASSMENT BY TELEPHONE', 'THEFT FROM MOTOR VEHICLE',
'STRONG ARM - NO WEAPON', 'SOLICITING FOR BUSINESS', 'BY FIRE',
'TO STATE SUPPORTED PROPERTY', 'TELEPHONE THREAT',
'POSSESS - CRACK', 'TO RESIDENCE', 'THEFT / RECOVERY - AUTOMOBILE',
'ENDANGERING LIFE / HEALTH OF CHILD',
'AGGRAVATED - OTHER DANGEROUS WEAPON', 'UNLAWFUL RESTRAINT',
'AGGRAVATED', 'ARMED - OTHER DANGEROUS WEAPON',
'CRIMINAL SEXUAL ABUSE', 'PROHIBITED PLACES', 'OTHER OFFENSE',
'COUNTERFEITING DOCUMENT', 'TO CITY OF CHICAGO PROPERTY',
'THEFT OF LABOR / SERVICES', 'POSSESS - HEROIN (WHITE)',
'RESIST / OBSTRUCT / DISARM OFFICER',
'NON-CONSENSUAL DISSEMINATION OF PRIVATE SEXUAL IMAGES',
'BOMB THREAT',
'FINANCIAL EXPLOITATION OF AN ELDERLY OR DISABLED PERSON',
'AGGRAVATED - HANDS, FISTS, FEET, SERIOUS INJURY',
'OTHER VIOLATION', 'CYCLE, SCOOTER, BIKE WITH VIN',
'ATTEMPT ARMED - HANDGUN', 'POCKET-PICKING',
'PROTECTED EMPLOYEE - HANDS, FISTS, FEET, NO / MINOR INJURY',
'UNLAWFUL USE - OTHER FIREARM',
'AGG. PROTECTED EMPLOYEE - HANDS, FISTS, FEET, SERIOUS INJURY',
'POSSESS - BARBITURATES', 'OTHER CRIME AGAINST PERSON',
'ATTEMPT AGGRAVATED', 'HOME INVASION',
'ILLEGAL POSSESSION CASH CARD', 'LICENSE VIOLATION',
'STOLEN PROPERTY BUY / RECEIVE / POSSESS',
'POSSESS - CANNABIS MORE THAN 30 GRAMS',
'ATTEMPT - CYCLE, SCOOTER, BIKE WITH VIN',
'SOLICIT NARCOTICS ON PUBLIC WAY', 'COMPUTER FRAUD',
'POSSESS - COCAINE', 'POSSESS - SYNTHETIC DRUGS',
'VIOLATION OF STALKING NO CONTACT ORDER',
'AGGRAVATED FINANCIAL IDENTITY THEFT',
'UNLAWFUL USE - OTHER DANGEROUS WEAPON',
'MANUFACTURE / DELIVER - CANNABIS OVER 10 GRAMS',
'AGGRAVATED OF A SENIOR CITIZEN', 'POSSESS - AMPHETAMINES',
'AGGRAVATED DOMESTIC BATTERY - KNIFE / CUTTING INSTRUMENT',
'ARMED - KNIFE / CUTTING INSTRUMENT', 'FALSE POLICE REPORT',
'AGGRAVATED POLICE OFFICER - OTHER DANGEROUS WEAPON',
'OBSTRUCTING IDENTIFICATION', 'RECKLESS CONDUCT',
'BURGLARY FROM MOTOR VEHICLE',
'AGGRAVATED PROTECTED EMPLOYEE - OTHER DANGEROUS WEAPON',
'AGGRAVATED - HANDS, FISTS, FEET, NO / MINOR INJURY',
'FORFEIT PROPERTY', 'AGGRAVATED - OTHER FIREARM',
'ATTEMPT AGGRAVATED CRIMINAL SEXUAL ABUSE',
'ATTEMPT STRONG ARM - NO WEAPON', 'ATTEMPT ARSON',
'LIQUOR LICENSE VIOLATION', 'CRIMINAL DEFACEMENT',
'GUN OFFENDER - ANNUAL REGISTRATION', 'EMBEZZLEMENT',
'CHILD PORNOGRAPHY', 'ATTEMPT THEFT', 'COUNTERFEIT CHECK', 'OTHER',
'PUBLIC INDECENCY', 'FOUND SUSPECT NARCOTICS',
'AGGRAVATED DOMESTIC BATTERY - HANDGUN', 'OBSTRUCTING SERVICE',
'GUN OFFENDER - DUTY TO REGISTER', 'POSSESSION OF BURGLARY TOOLS',
'ATTEMPT ARMED - KNIFE / CUTTING INSTRUMENT', 'PURSE-SNATCHING',
'VIOLATION GPS MONITORING DEVICE', 'AGGRAVATED OF A CHILD',
'POSSESS - CANNABIS 30 GRAMS OR LESS', 'ANIMAL ABUSE / NEGLECT',
'OBSCENE TELEPHONE CALLS',
'THEFT / RECOVERY - CYCLE, SCOOTER, BIKE WITH VIN',
'AGGRAVATED POLICE OFFICER - HANDGUN',
'OTHER CRIME INVOLVING PROPERTY', 'KIDNAPPING',
'OBSTRUCTING JUSTICE', 'CONCEALED CARRY LICENSE REVOCATION',
'ARMED WHILE UNDER THE INFLUENCE', 'EMPLOY MINOR',
'UNLAWFUL POSSESSION - AMMUNITION', 'POSSESSION OF DRUG EQUIPMENT',
'AGGRAVATED POLICE OFFICER - HANDS, FISTS, FEET, NO INJURY',
'PEEPING TOM', 'OF AN UNBORN CHILD',
'SELL / GIVE / DELIVER LIQUOR TO MINOR', 'INTIMIDATION',
'POSSESS - HALLUCINOGENS', 'POSSESS - PCP',
'CHILD ABDUCTION / STRANGER', 'DECEPTIVE COLLECTION PRACTICES',
'STATE BENEFITS FRAUD', 'TRUCK, BUS, MOTOR HOME',
'MANUFACTURE / DELIVER - CRACK',
'VIOLENT OFFENDER - ANNUAL REGISTRATION',
'POST GRAPHIC INFO PORGNOGRAPHIC INTERNET OR POSS GRAPHIC INF',
'ALTER / FORGE PRESCRIPTION', 'UNLAWFUL USE - HANDGUN',
'AGGRAVATED POLICE OFFICER - KNIFE / CUTTING INSTRUMENT',
'AGGRAVATED COMPUTER TAMPERING',
'CONTRIBUTE TO THE DELINQUENCY OF CHILD', 'ARMED: HANDGUN',
'INDECENT SOLICITATION OF A CHILD',
'AGGRAVATED P.O. - HANDS, FISTS, FEET, SERIOUS INJURY',
'SEX OFFENDER - FAIL TO REGISTER',
'VIOLATION OF CIVIL NO CONTACT ORDER', 'INVOLUNTARY SERVITUDE',
'SOLICIT ON PUBLIC WAY', 'GAME/DICE', 'ATTEMPT NON-AGGRAVATED',
'AGGRAVATED PROTECTED EMPLOYEE - KNIFE / CUTTING INSTRUMENT',
'UNAUTHORIZED VIDEOTAPING',
'MANUFACTURE / DELIVER - HEROIN (WHITE)',
'THEFT / RECOVERY - TRUCK, BUS, MOBILE HOME',
'POSSESS - HYPODERMIC NEEDLE', 'COMMERCIAL SEX ACTS',
'CYCLE, SCOOTER, BIKE NO VIN',
'VIOLENT OFFENDER - DUTY TO REGISTER',
'UNLAWFUL USE OF A COMPUTER', 'CHILD ABANDONMENT',
'POSSESS - METHAMPHETAMINE', 'ESCAPE', 'ARMED VIOLENCE',
'OTHER WEAPONS VIOLATION', 'ARSON THREAT', 'OBSCENE MATTER',
'SEX OFFENDER - FAIL TO REGISTER NEW ADDRESS',
'ATTEMPT ARMED - OTHER DANGEROUS WEAPON', 'EXTORTION',
'MANUFACTURE / DELIVER - PCP',
'POSSESS - HEROIN (TAN / BROWN TAR)', 'POSS: COCAINE',
'UNLAWFUL SALE - HANDGUN',
'MANUFACTURE / DELIVER - CANNABIS 10 GRAMS OR LESS',
'EAVESDROPPING', 'MANUFACTURE / DELIVER - BARBITURATES',
'AGGRAVATED PROTECTED EMPLOYEE - HANDGUN', 'IMPERSONATION',
'POSS: HEROIN(WHITE)', 'POSS: CRACK',
'UNLAWFUL VISITATION INTERFERENCE',
'ATTEMPT AGGRAVATED - KNIFE / CUTTING INSTRUMENT',
'INSURANCE FRAUD',
'GUN OFFENDER - DUTY TO REPORT CHANGE OF INFORMATION',
'FROM COIN-OPERATED MACHINE OR DEVICE', 'SOLICIT OFF PUBLIC WAY',
'UNLAWFUL POSSESSION - OTHER FIREARM',
'POSSESS FIREARM / AMMUNITION - NO FOID CARD',
'THEFT BY LESSEE, NON-MOTOR VEHICLE', 'DELIVERY CONTAINER THEFT',
'ATTEMPT AGGRAVATED - OTHER', 'TAMPER WITH MOTOR VEHICLE',
'SOLICITATION OF A SEXUAL ACT', 'BOARD PLANE WITH WEAPON',
'PUBLIC AID WIRE/MAIL FRAUD - VIA MAIL/PACKAGE/DELIVERY SYS',
'MANUFACTURE / DELIVER - METHAMPHETAMINE', 'TO AIRPORT',
'ATTEMPT ARMED - OTHER FIREARM',
'TIRE DEFLATION DEVICE DEPLOYMENT', 'PAROLE VIOLATION',
'AGGRAVATED: HANDGUN', 'UNLAWFUL USE / SALE OF AIR RIFLE',
'POSSESSION OF PORNOGRAPHIC PRINT',
'VIOLATION OF BAIL BOND - DOMESTIC VIOLENCE',
'POSSESSION - EXPLOSIVE / INCENDIARY DEVICE',
'CRIMINAL DRUG CONSPIRACY', 'WIC FRAUD',
'VIOLATION OF SUMMARY CLOSURE', 'ATTEMPT - TRUCK, BUS, MOTOR HOME',
'THEFT / RECOVERY - CYCLE, SCOOTER, BIKE NO VIN', 'CALL OPERATION',
'MANUFACTURE / DELIVER - SYNTHETIC DRUGS',
'SEX OFFENDER - PROHIBITED ZONE', 'INTERFERENCE JUDICIAL PROCESS',
'ATTEMPT CRIMINAL SEXUAL ABUSE', 'SEXUAL RELATIONS IN FAMILY',
'MANUFACTURE / DELIVER - COCAINE', 'INSTITUTIONAL VANDALISM',
'AGG CRIM SEX ABUSE - VIC 13-16 YOA - OFF 5 YR OLDER PENETRAT',
'MANUFACTURE / DELIVER - HEROIN (TAN / BROWN TAR)',
'ILLEGAL POSSESSION BY MINOR', 'STRONGARM - NO WEAPON',
'CRIMINAL SEXUAL ABUSE - SEXUAL PENETRATION',
'HAZARDOUS MATERIALS VIOLATION', 'ATTEMPT AGGRAVATED - HANDGUN',
'OTHER PROSTITUTION OFFENSE', 'ABUSE / NEGLECT - CARE FACILITY',
'MANU/DELIVER:CRACK', 'CANNABIS PLANT', 'MOB ACTION',
'TO FIRE FIGHT.APP.EQUIP', 'OBSCENITY', 'RECKLESS HOMICIDE',
'MANUFACTURE / DELIVER - HALLUCINOGEN', 'INTOXICATING COMPOUNDS',
'FORCIBLE DETENTION', 'ATT: AUTOMOBILE', '$300 AND UNDER',
'ATTEMPT - CYCLE, SCOOTER, BIKE NO VIN',
'OTHER ARSON / EXPLOSIVE INCIDENT', 'PUBLIC DEMONSTRATION',
'BIGAMY', 'AGGRAVATED PROTECTED EMPLOYEE - OTHER FIREARM',
'VIOLENT OFFENDER - FAIL TO REGISTER NEW ADDRESS',
'AGGRAVATED POLICE OFFICER - OTHER FIREARM',
'ATTEMPT POSSESSION CANNABIS',
'MANUFACTURE / DELIVER - AMPHETAMINES',
'MANUFACTURE / DELIVER - SYNTHETIC MARIJUANA'], dtype=object)
Description SIMPLE 7785 OVER $500 5093 $500 AND UNDER 4721 DOMESTIC BATTERY SIMPLE 4656 AUTOMOBILE 4242 Name: count, dtype: int64
# number of unique locations
crimes['Location'].unique()
#Show the top 5 unique locations for crimes
crimes['Location'].value_counts()[:5]
array([nan, '(41.917838056, -87.755968972)',
'(41.995219444, -87.713354912)', ...,
'(41.87253933, -87.640895762)', '(41.817272712, -87.634557976)',
'(41.894226067, -87.619222865)'], dtype=object)
Location (41.883500187, -87.627876698) 81 (41.868541914, -87.639235361) 80 (41.788987036, -87.74147999) 66 (41.867428687, -87.626342565) 60 (41.963070794, -87.655984213) 59 Name: count, dtype: int64
# number of unique location descriptions
crimes['Location Description'].unique()
#Show the top 5 unique location descriptions for crimes
crimes['Location Description'].value_counts()[:5]
array(['OTHER', 'STREET', 'PARKING LOT', 'APARTMENT',
'SMALL RETAIL STORE', 'GAS STATION',
'PARKING LOT / GARAGE (NON RESIDENTIAL)',
'AIRPORT EXTERIOR - NON-SECURE AREA', nan, 'DAY CARE CENTER',
'CREDIT UNION', 'RESIDENCE - GARAGE',
'RESIDENCE - PORCH / HALLWAY', 'CURRENCY EXCHANGE', 'RESIDENCE',
'AUTO / BOAT / RV DEALERSHIP',
'POLICE FACILITY / VEHICLE PARKING LOT', 'DEPARTMENT STORE',
'CHA PARKING LOT / GROUNDS', 'RESTAURANT', 'GROCERY FOOD STORE',
'APPLIANCE STORE', 'OTHER (SPECIFY)',
'RESIDENCE - YARD (FRONT / BACK)', 'ALLEY', 'SIDEWALK',
'VEHICLE NON-COMMERCIAL', 'VACANT LOT / LAND', 'BAR OR TAVERN',
'CAR WASH', 'HOSPITAL BUILDING / GROUNDS',
'COMMERCIAL / BUSINESS OFFICE', 'DRIVEWAY - RESIDENTIAL',
'PARK PROPERTY', 'BANK', 'DRUG STORE',
'LAKEFRONT / WATERFRONT / RIVERBANK', 'SCHOOL - PUBLIC BUILDING',
'AIRPORT TERMINAL LOWER LEVEL - SECURE AREA',
'NURSING / RETIREMENT HOME', 'HOTEL / MOTEL', 'CONVENIENCE STORE',
'CTA BUS STOP', 'AIRPORT TERMINAL UPPER LEVEL - NON-SECURE AREA',
'GOVERNMENT BUILDING / PROPERTY', 'TAVERN / LIQUOR STORE',
'CTA PLATFORM', 'COLLEGE / UNIVERSITY - RESIDENCE HALL',
'AIRPORT TERMINAL LOWER LEVEL - NON-SECURE AREA',
'VEHICLE - COMMERCIAL', 'SCHOOL - PUBLIC GROUNDS', 'WAREHOUSE',
'CTA TRAIN', 'CTA BUS', 'ATM (AUTOMATIC TELLER MACHINE)',
'AIRPORT TERMINAL UPPER LEVEL - SECURE AREA', 'CONSTRUCTION SITE',
'AIRPORT BUILDING NON-TERMINAL - NON-SECURE AREA', 'ATHLETIC CLUB',
'CHURCH / SYNAGOGUE / PLACE OF WORSHIP', 'CTA STATION',
'CHA APARTMENT', 'CEMETARY', 'ABANDONED BUILDING',
'CHA HALLWAY / STAIRWELL / ELEVATOR',
'OTHER RAILROAD PROPERTY / TRAIN DEPOT', 'CHA GROUNDS', 'LIBRARY',
'BOAT / WATERCRAFT',
'VEHICLE - OTHER RIDE SHARE SERVICE (LYFT, UBER, ETC.)',
'AIRPORT BUILDING NON-TERMINAL - SECURE AREA',
'SPORTS ARENA / STADIUM', 'AIRCRAFT',
'CTA PARKING LOT / GARAGE / OTHER PROPERTY', 'BARBERSHOP',
'SCHOOL - PRIVATE GROUNDS', 'CTA TRACKS - RIGHT OF WAY',
'GAS STATION DRIVE/PROP.', 'TAXICAB', 'ANIMAL HOSPITAL',
'SCHOOL - PRIVATE BUILDING', 'MEDICAL / DENTAL OFFICE',
'OTHER COMMERCIAL TRANSPORTATION', 'AIRPORT PARKING LOT',
'CASINO/GAMBLING ESTABLISHMENT', 'MOVIE HOUSE / THEATER',
'CLEANING STORE', 'POOL ROOM', 'FACTORY / MANUFACTURING BUILDING',
'COLLEGE / UNIVERSITY - GROUNDS', 'AIRPORT EXTERIOR - SECURE AREA',
'HIGHWAY / EXPRESSWAY', 'FEDERAL BUILDING', 'HOUSE', 'PAWN SHOP',
'FIRE STATION', 'ROOMING HOUSE', 'VEHICLE - DELIVERY TRUCK',
'HALLWAY', 'AUTO', 'COIN OPERATED MACHINE',
'AIRPORT TRANSPORTATION SYSTEM (ATS)', 'JAIL / LOCK-UP FACILITY',
'BRIDGE', 'PORCH', 'AIRPORT VENDING ESTABLISHMENT',
'AIRPORT/AIRCRAFT', 'GARAGE', 'KENNEL', 'YARD', 'FOREST PRESERVE',
'BOWLING ALLEY', 'AIRPORT TERMINAL MEZZANINE - NON-SECURE AREA',
'VEHICLE - COMMERCIAL: ENTERTAINMENT / PARTY BUS', 'GANGWAY',
'RETAIL STORE', 'VACANT LOT', 'CHA HALLWAY'], dtype=object)
Location Description STREET 18618 APARTMENT 11726 RESIDENCE 7805 SIDEWALK 3811 PARKING LOT / GARAGE (NON RESIDENTIAL) 2300 Name: count, dtype: int64
# unique years of crimes
crimes['Year'].unique()
#show the top five most common years of crimes to take place
crimes['Year'].value_counts()[:5]
array([2007, 2021, 2023, 2002, 2024, 2022, 2019, 2020, 2011, 2015, 2014,
2018, 2010, 2013, 2004, 2017, 2008, 2016, 2005, 2001, 2012, 2009,
2003, 2006])
Year 2023 43476 2024 20628 2022 402 2021 244 2020 101 Name: count, dtype: int64
# unique number of arrests for crimes
crimes['Arrest'].unique()
# show the top five crimes for arrests that were actually made
crimes['Arrest'].value_counts()[:5]
array([False, True])
Arrest False 57302 True 8197 Name: count, dtype: int64
These fields provide the X and Y coordinates on a projected map. The minimum values of 0 indicate that some entries may be missing or unrecorded in specific locations, as coordinates should generally not be zero.¶
The minimum latitude and longitude values suggest that a few entries may be slightly outside the expected range, possibly as geographic outliers; however, because I have the location and description, that will not be necessary.¶
crimes.describe()
| ID | Beat | District | Ward | Community Area | X Coordinate | Y Coordinate | Year | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|
| count | 6.549900e+04 | 65499.000000 | 65499.000000 | 65458.000000 | 65463.000000 | 6.476200e+04 | 6.476200e+04 | 65499.000000 | 64762.000000 | 64762.000000 |
| mean | 1.326482e+07 | 1160.510420 | 11.374082 | 23.187357 | 36.178223 | 1.165304e+06 | 1.887743e+06 | 2023.184476 | 41.847548 | -87.668855 |
| std | 9.276374e+05 | 709.939284 | 7.093998 | 13.963394 | 21.681136 | 1.686823e+04 | 3.252139e+04 | 1.400784 | 0.089464 | 0.061182 |
| min | 1.906000e+03 | 111.000000 | 1.000000 | 1.000000 | 1.000000 | 0.000000e+00 | 0.000000e+00 | 2001.000000 | 36.619446 | -91.686566 |
| 25% | 1.322308e+07 | 533.000000 | 5.000000 | 10.000000 | 22.000000 | 1.154142e+06 | 1.860389e+06 | 2023.000000 | 41.772342 | -87.709409 |
| 50% | 1.324707e+07 | 1034.000000 | 10.000000 | 23.000000 | 32.000000 | 1.167111e+06 | 1.894108e+06 | 2023.000000 | 41.865145 | -87.661980 |
| 75% | 1.358230e+07 | 1733.000000 | 17.000000 | 34.000000 | 53.000000 | 1.176670e+06 | 1.910796e+06 | 2024.000000 | 41.910858 | -87.627214 |
| max | 1.361962e+07 | 2535.000000 | 31.000000 | 50.000000 | 77.000000 | 1.205119e+06 | 1.951503e+06 | 2024.000000 | 42.022549 | -87.524542 |
crimes['Location Description'].describe()
count 65084 unique 117 top STREET freq 18618 Name: Location Description, dtype: object
crimes['Location'].describe()
count 64762 unique 44328 top (41.883500187, -87.627876698) freq 81 Name: Location, dtype: object
crimes['Primary Type'].describe()
count 65499 unique 31 top THEFT freq 14831 Name: Primary Type, dtype: object
import matplotlib.pyplot as plt
crimes.boxplot(column='Year')
<Axes: >
Dropping 16 columns that are not necessary for my analysis¶
col_to_drop = ['ID','Case Number','Date','Block','IUCR','Domestic','Beat','District','Ward','Community Area','FBI Code','X Coordinate','Y Coordinate','Updated On','Latitude','Longitude']
#before drop
crimes.shape
crimes = crimes.drop(columns = col_to_drop, axis = 1, inplace = False)
#after drop
crimes.shape
(65499, 22)
(65499, 6)
crimes.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 65499 entries, 0 to 65498 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Primary Type 65499 non-null object 1 Description 65499 non-null object 2 Location Description 65084 non-null object 3 Arrest 65499 non-null bool 4 Year 65499 non-null int64 5 Location 64762 non-null object dtypes: bool(1), int64(1), object(4) memory usage: 2.6+ MB
Checking out remaining nulls¶
crimes['Primary Type'].isnull().sum()
missing_info = crimes[crimes['Primary Type'].isnull()][['Primary Type','Year','Location','Location Description','Arrest']]
missing_info
0
| Primary Type | Year | Location | Location Description | Arrest |
|---|
crimes.to_csv('crimes_final.csv', header = True, index = False)
EDA Phase 2¶
#Load modules
import pandas as pd
#Load for visuals
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
#Setting up notebook to display multiple output in one cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'
#Read in file from Phase 1
crimes = pd.read_csv('crimes_final.csv')
#What is the shape of the data
crimes.info()
#Look at first five records
crimes.head()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 65499 entries, 0 to 65498 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Primary Type 65499 non-null object 1 Description 65499 non-null object 2 Location Description 65084 non-null object 3 Arrest 65499 non-null bool 4 Year 65499 non-null int64 5 Location 64762 non-null object dtypes: bool(1), int64(1), object(4) memory usage: 2.6+ MB
| Primary Type | Description | Location Description | Arrest | Year | Location | |
|---|---|---|---|---|---|---|
| 0 | ASSAULT | SIMPLE | OTHER | False | 2007 | NaN |
| 1 | HOMICIDE | FIRST DEGREE MURDER | STREET | True | 2021 | (41.917838056, -87.755968972) |
| 2 | HOMICIDE | FIRST DEGREE MURDER | PARKING LOT | True | 2021 | (41.995219444, -87.713354912) |
| 3 | BURGLARY | UNLAWFUL ENTRY | APARTMENT | False | 2023 | (41.952345086, -87.677975059) |
| 4 | BATTERY | AGGRAVATED P.O. - HANDS, FISTS, FEET, NO / MIN... | SMALL RETAIL STORE | True | 2023 | (41.737750767, -87.604855911) |
Using pandas crosstab to count occurrences for each combination of values in the Arrest and Year columns.¶
df50 = pd.DataFrame(pd.crosstab(crimes['Arrest'], crimes['Year']))
df50.head(10)
| Year | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | ... | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Arrest | |||||||||||||||||||||
| False | 25 | 37 | 41 | 16 | 31 | 5 | 7 | 16 | 21 | 15 | ... | 40 | 34 | 35 | 58 | 69 | 81 | 195 | 306 | 38232 | 17946 |
| True | 6 | 9 | 2 | 3 | 1 | 0 | 3 | 2 | 8 | 3 | ... | 11 | 6 | 8 | 8 | 13 | 20 | 49 | 96 | 5244 | 2682 |
2 rows × 24 columns
The crime numbers have fluctuated, and we can also see a notably higher count of crimes where no arrests were made compared to actual arrests. In 2023, only 5,244 true arrests were conducted to arrests and in 2024, only 2,682 arrests. The overall number of reported crimes has increased continuously since 2015.¶
Creating a heatmap of top 10 crimes¶
top_crimes = crimes['Primary Type'].value_counts().head(10).index
filtered_df = crimes[crimes['Primary Type'].isin(top_crimes)]
heatmap_data = filtered_df.pivot_table(
index='Primary Type', columns='Year', values='Description', aggfunc='count', fill_value=0
)
# Plot the heatmap without numbers
sns.heatmap(heatmap_data, cmap="Blues")
plt.title("Top 10 Crimes by Year")
plt.ylabel("Crime Type")
plt.xlabel("Year")
plt.show()
<Axes: xlabel='Year', ylabel='Primary Type'>
Text(0.5, 1.0, 'Top 10 Crimes by Year')
Text(50.7222222222222, 0.5, 'Crime Type')
Text(0.5, 23.52222222222222, 'Year')
This heatmap indicates that theft was highest in 2023, but it has decreased in 2024.¶
Counting Crimes by location¶
I am using plotly.express to find out where crimes take place the most.¶
locations = crimes['Location Description'].value_counts().head(10).reset_index()
locations.columns = ['Location', 'Count']
fig = px.bar(locations, x='Count', y='Location', orientation='h',
title="Top Ten Locations",
labels={'Count': 'Number of Crimes', 'Location': 'Location Description'},
color='Count', color_continuous_scale='Purples')
fig.show()
After exploring the interactive bar chart, we can note that crimes occur most in the street, where there are 18,618 reported crimes.¶
My next goal is to compare which crimes had no arrest, and which crimes did.¶
# Grouping and counting crimes
arrest_count = crimes.groupby(['Primary Type', 'Arrest']).size().unstack(fill_value=0)
# Naming columns
arrest_count.columns = ['No Arrest', 'Arrest']
# Sort by the 'Arrest' column in descending order
arrest_count = arrest_count.sort_values('Arrest', ascending=False)
#Displaying
arrest_count.head()
| No Arrest | Arrest | |
|---|---|---|
| Primary Type | ||
| BATTERY | 9330 | 1764 |
| NARCOTICS | 59 | 1245 |
| WEAPONS VIOLATION | 913 | 1155 |
| THEFT | 14012 | 819 |
| OTHER OFFENSE | 3232 | 697 |
I am creating a bar chart for arrested versus non-arrested crimes, to look at percentage.¶
#Plotting graph
crimes['Arrest'].value_counts(normalize=True).plot.bar(title='Percentage of Crimes to Arrest')
plt.show()
<Axes: title={'center': 'Percentage of Crimes to Arrest'}, xlabel='Arrest'>
Battery appears to be the most reported as well as the top crime where no arrest was made. Narcotics seem to be the most concentrated for arrest.¶
Crime Prevalence by Location Coordinates using a bar chart.¶
crimes['Location'].value_counts().head(10).plot.barh(title='Top 10 Coordinates with Most Crimes')
plt.show()
<Axes: title={'center': 'Top 10 Coordinates with Most Crimes'}, ylabel='Location'>
With this information, we now know that the coordinate: (41.883500187, -87.627876698) (100-148 N State St Chicago, IL 60602) appears to have the most crime.¶
Zooming into 2015 to present, so I can have a better visualization of the crimes.¶
#Limiting the years
crime_trends = crimes.groupby(['Year', 'Primary Type']).size().unstack(fill_value=0)
crime_trends = crime_trends[crime_trends.index >= 2015]
#plotting
crime_trends.plot(figsize=(12, 6), colormap='rainbow', linewidth=4)
plt.title("Crime Trends from 2015 to 2024", fontsize=15)
plt.xlabel("Year")
plt.ylabel("Number of Crimes")
plt.legend(title="Crime Type", bbox_to_anchor=(1.01, 1), loc='upper left')
plt.grid(axis='y', alpha=0.7)
plt.show()
<Axes: xlabel='Year'>
Text(0.5, 1.0, 'Crime Trends from 2015 to 2024')
Text(0.5, 0, 'Year')
Text(0, 0.5, 'Number of Crimes')
<matplotlib.legend.Legend at 0x12a9f1250>
Some crimes, such as battery, assault, and burglary, follow consistent patterns or rise frequently. The tremendous growth in recent years could be due to distinctive independent facets; for instance, documenting modifications, population increase, or further societal impacts.¶
crime_counts = crimes['Primary Type'].value_counts()
# Combine smaller categories into "Other"
threshold = 0.02 # Categories below 2% will be grouped
crime_counts = crime_counts[crime_counts / crime_counts.sum() > threshold]
crime_counts['Other'] = crimes['Primary Type'].value_counts().sum() - crime_counts.sum()
plt.figure(figsize=(8, 8))
plt.pie(crime_counts, labels=crime_counts.index, autopct='%1.1f%%', startangle=140, labeldistance=1.2)
plt.title('Proportion of Crime Types')
colors = sns.color_palette('pastel', len(filtered_counts))
plt.show()
<Figure size 800x800 with 0 Axes>
([<matplotlib.patches.Wedge at 0x12be51490>, <matplotlib.patches.Wedge at 0x12bd18b90>, <matplotlib.patches.Wedge at 0x12be51cd0>, <matplotlib.patches.Wedge at 0x12be52330>, <matplotlib.patches.Wedge at 0x12be52960>, <matplotlib.patches.Wedge at 0x12be52f90>, <matplotlib.patches.Wedge at 0x12be535c0>, <matplotlib.patches.Wedge at 0x12be53b90>, <matplotlib.patches.Wedge at 0x12be80230>], [Text(-1.1998951076556308, -0.015866021053878753, 'THEFT'), Text(-0.370762023608177, -1.141286783350254, 'BATTERY'), Text(0.6361422475341697, -1.0175082510241256, 'CRIMINAL DAMAGE'), Text(1.1179747186952125, -0.4360418883070298, 'MOTOR VEHICLE THEFT'), Text(1.174678007419675, 0.2452174114628524, 'ASSAULT'), Text(0.9294078739491969, 0.7590790497973409, 'DECEPTIVE PRACTICE'), Text(0.5717890476107171, 1.055015300852281, 'OTHER OFFENSE'), Text(0.1854973204267981, 1.1855761232896342, 'ROBBERY'), Text(-0.5059461528914141, 1.0881261371616702, 'Other')], [Text(-0.5999475538278154, -0.007933010526939377, '22.6%'), Text(-0.1853810118040885, -0.570643391675127, '16.9%'), Text(0.31807112376708485, -0.5087541255120628, '10.8%'), Text(0.5589873593476062, -0.2180209441535149, '9.5%'), Text(0.5873390037098375, 0.1226087057314262, '8.9%'), Text(0.46470393697459844, 0.37953952489867043, '6.4%'), Text(0.28589452380535857, 0.5275076504261405, '6.0%'), Text(0.09274866021339905, 0.5927880616448171, '4.9%'), Text(-0.25297307644570705, 0.5440630685808351, '13.9%')])
Text(0.5, 1.0, 'Proportion of Crime Types')
import plotly.express as px
# Example: Bubble chart for arrests
bubble_data = crimes.groupby(['Year', 'Primary Type']).size().reset_index(name='Count')
fig = px.scatter(bubble_data, x='Year', y='Primary Type', size='Count',
color='Primary Type', title='Crime Trends by Arrests')
fig.show()
import pandas as pd
# Group data by location and year
location_year_data = crimes.groupby(['Location Description', 'Year']).size().unstack(fill_value=0)
location_year_data = location_year_data.loc[["STREET", "RESIDENCE", "SIDEWALK", "PARKING LOT", "ALLEY"]]
# Plot stacked bar chart
location_year_data.T.plot(kind='bar', stacked=True, figsize=(12, 8))
plt.title('Crimes by Location Over Time')
plt.xlabel('Year')
plt.ylabel('Number of Crimes')
plt.legend(title='Location Description')
plt.show()
<Axes: xlabel='Year'>
Text(0.5, 1.0, 'Crimes by Location Over Time')
Text(0.5, 0, 'Year')
Text(0, 0.5, 'Number of Crimes')
<matplotlib.legend.Legend at 0x12c569760>
Summary¶
These visualizations have been very helpful in discovering which crime is the most prevalent, which places to avoid moving to, or even being more aware of particular locations and crimes. I thought that the more violent crimes would be a lot higher in the area I was investigating, but I was also shocked to see how many crimes were not under True arrests.¶
#write out final files for EDA 2
crimes.to_csv('crimes_final.csv',header = True, index = False)